Downstream analysis using the loom file generated from this notebook
--> This might be very slow. Consider passing `cache=True`, which enables much faster reading from a cache file.
Create regulons from a dataframe of enriched features. Additional columns saved: []
computing PCA
with n_comps=50
finished (0:00:00)
computing neighbors
using 'X_pca' with n_pcs = 40
finished: added to `.uns['neighbors']`
`.obsp['distances']`, distances for each pair of neighbors
`.obsp['connectivities']`, weighted adjacency matrix (0:00:17)
computing UMAP
finished: added
'X_umap', UMAP coordinates (adata.obsm) (0:00:06)
Index(['ARID3A(+)', 'ARNTL(+)', 'ATF1(+)', 'ATF2(+)', 'ATF4(+)', 'ATF6(+)',
'BACH1(+)', 'BACH2(+)', 'BCL11B(+)', 'BCL6(+)',
...
'ZNF576(+)', 'ZNF580(+)', 'ZNF587(+)', 'ZNF600(+)', 'ZNF655(+)',
'ZNF669(+)', 'ZNF680(+)', 'ZNF708(+)', 'ZNF770(+)', 'ZSCAN26(+)'],
dtype='object', length=219)
normalizing counts per cell
finished (0:00:00)
running Leiden clustering
finished: found 3 clusters and added
'leiden', the cluster labels (adata.obs, categorical) (0:00:00)
ranking genes
finished: added to `.uns['rank_genes_groups']`
'names', sorted np.recarray to be indexed by group ids
'scores', sorted np.recarray to be indexed by group ids
'logfoldchanges', sorted np.recarray to be indexed by group ids
'pvals', sorted np.recarray to be indexed by group ids
'pvals_adj', sorted np.recarray to be indexed by group ids (0:00:19)
Top up-regulated genes
| pvals_adj | logfoldchanges | abs_lfc | scores | 0 | |
|---|---|---|---|---|---|
| RACK1 | 0.000000e+00 | 1.529024 | 1.529024 | 46.862011 | 0.867624 |
| CTSW | 0.000000e+00 | 1.714248 | 1.714248 | 46.433117 | 0.834566 |
| ACTB | 0.000000e+00 | 1.352698 | 1.352698 | 46.423500 | 0.984776 |
| FCER1G | 0.000000e+00 | 2.062953 | 2.062953 | 44.366268 | 0.779759 |
| RPS17 | 0.000000e+00 | 1.198917 | 1.198917 | 42.338680 | 0.913151 |
| PFN1 | 0.000000e+00 | 1.037864 | 1.037864 | 40.129578 | 0.941424 |
| EEF1G | 0.000000e+00 | 1.320802 | 1.320802 | 40.103203 | 0.809482 |
| ATP5F1E | 0.000000e+00 | 1.268447 | 1.268447 | 37.858982 | 0.848485 |
| RPL31 | 3.750411e-295 | 0.749093 | 0.749093 | 36.888947 | 0.950123 |
| TMSB10 | 7.722950e-295 | 0.695977 | 0.695977 | 36.867626 | 0.982456 |
up- down- regulated pathways
Cluster 0 doesn't upregulates specific genes of immune checkpoint receptors related to NK exhaustion. The enriched pathways suggest the enrichment of immune processes
Top up-regulated genes
| pvals_adj | logfoldchanges | abs_lfc | scores | 1 | |
|---|---|---|---|---|---|
| MTRNR2L12 | 0.0 | 5.428245 | 5.428245 | 85.957787 | 0.985603 |
| TPT1 | 0.0 | 1.528299 | 1.528299 | 73.922653 | 0.997876 |
| METRNL | 0.0 | 4.278031 | 4.278031 | 70.394096 | 0.842577 |
| CLEC2B | 0.0 | 2.772367 | 2.772367 | 64.237289 | 0.950673 |
| JUND | 0.0 | 3.219890 | 3.219890 | 64.022560 | 0.884116 |
| LINC01578 | 0.0 | 3.653805 | 3.653805 | 62.212978 | 0.778145 |
| LDHA | 0.0 | 2.702229 | 2.702229 | 61.946045 | 0.938872 |
| GNAS | 0.0 | 2.926696 | 2.926696 | 61.386494 | 0.874675 |
| HIST1H4C | 0.0 | 3.066264 | 3.066264 | 61.276398 | 0.853198 |
| HLA-A | 0.0 | 1.099264 | 1.099264 | 60.495953 | 0.999056 |
Cluster 1 looks the opposite of Cluster 0, down regulation of immune related pathways and active cell transcripion. Could these cell be exhausted? NO division between exhausted and resident NK cells were found in the UMAP
Top up-regulated genes
| pvals_adj | logfoldchanges | abs_lfc | scores | 2 | |
|---|---|---|---|---|---|
| ATP5E | 0.000000e+00 | 8.162164 | 8.162164 | 49.703804 | 0.602961 |
| GNB2L1 | 0.000000e+00 | 7.683477 | 7.683477 | 47.699005 | 0.580579 |
| ATP5L | 0.000000e+00 | 7.687450 | 7.687450 | 37.945911 | 0.460399 |
| ATP5G2 | 9.237418e-304 | 8.493318 | 8.493318 | 37.404438 | 0.452135 |
| RPL7 | 5.495666e-272 | 0.595198 | 0.595198 | 35.389103 | 0.892218 |
| RARRES3 | 3.587023e-260 | 7.555178 | 7.555178 | 34.610233 | 0.420110 |
| C14orf2 | 2.127094e-249 | 7.573108 | 7.573108 | 33.884426 | 0.411157 |
| SEPT7 | 3.287299e-247 | 8.204267 | 8.204267 | 33.734791 | 0.408058 |
| TCEB2 | 1.044718e-233 | 7.623756 | 7.623756 | 32.798454 | 0.397727 |
| RPL34 | 8.169294e-222 | 0.370979 | 0.370979 | 31.949730 | 0.949036 |
ARID3A_(+) ARNTL_(+) ATF1_(+) ATF2_(+) ATF4_(+) ATF6_(+) BACH1_(+) \ 2 0.394791 0.318480 0.357749 0.378298 0.353037 0.296966 0.348593 0 0.460208 0.475935 0.538349 0.478939 0.525389 0.438195 0.501305 1 0.337867 0.401619 0.379810 0.355069 0.410612 0.331232 0.438827 BACH2_(+) BCL11B_(+) BCL6_(+) ... ZNF576_(+) ZNF580_(+) ZNF587_(+) \ 2 0.342087 0.336559 0.354982 ... 0.397024 0.375913 0.247703 0 0.470368 0.497121 0.356044 ... 0.507455 0.533577 0.319101 1 0.401026 0.455151 0.389804 ... 0.319025 0.387931 0.235772 ZNF600_(+) ZNF655_(+) ZNF669_(+) ZNF680_(+) ZNF708_(+) ZNF770_(+) \ 2 0.331257 0.272575 0.399379 0.517156 0.361058 0.339316 0 0.418928 0.351432 0.525137 0.352200 0.220787 0.266013 1 0.353573 0.250323 0.324060 0.280323 0.227524 0.216985 ZSCAN26_(+) 2 0.280710 0 0.393594 1 0.282187 [3 rows x 219 columns]
mean StDev Ratio 2 0.347 0.048 0.207 0 0.463 0.082 0.491 1 0.365 0.072 0.302
| ARID3A_(+) | ARNTL_(+) | ATF1_(+) | ATF2_(+) | ATF4_(+) | ATF6_(+) | BACH1_(+) | BACH2_(+) | BCL11B_(+) | BCL6_(+) | ... | ZNF576_(+) | ZNF580_(+) | ZNF587_(+) | ZNF600_(+) | ZNF655_(+) | ZNF669_(+) | ZNF680_(+) | ZNF708_(+) | ZNF770_(+) | ZSCAN26_(+) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MGUS_CD138nCD45p_2_ACTGCTCGTCTAGGTT-1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | ... | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 |
| MGUS_CD138nCD45p_2_CACATTTTCATAACCG-1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| MGUS_CD138nCD45p_2_CCAATCCGTGCAGACA-1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| MGUS_CD138nCD45p_2_GCATGTACAATCGAAA-1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | ... | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 |
| MGUS_CD138nCD45p_2_GCGAGAATCTGCGTAA-1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
5 rows × 219 columns
The specificity score for the clusters reach ~ 0.55, not super high but differently from previoous analyses the scores decrease meaningfully. Also the scores are in the rane of other examples provided by the group who developed scenic example. The heatmap also roughly cluster the clusters together. The AUC doesn't look like high, probably these specific regulons aren't highly expressed along the clusters, check:
Expression TFs cluster 2 along the umap
Expression TFs cluster 0 along the umap
Expression TFs cluster 1 along the umap
The TFs are indeed expressed in the different clusters, also looking at the TF in the cluster 2 maybe a more granular clustering would be useful. Now I proceed to save the TFs and regulons in a json file to export them:
ARID3A_(+) ARNTL_(+) ATF1_(+) ATF2_(+) ATF4_(+) \
CD56dimCD16+ NK cells 0.657145 0.685951 0.799759 0.695849 0.819322
CD56brightCD16- NK cells 0.236673 0.223587 0.237522 0.229460 0.244489
NK cell progenitors 0.170845 0.169895 0.170256 0.170300 0.171402
ATF6_(+) BACH1_(+) BACH2_(+) BCL11B_(+) \
CD56dimCD16+ NK cells 0.531031 0.824237 0.683304 0.834986
CD56brightCD16- NK cells 0.234234 0.236023 0.259407 0.226584
NK cell progenitors 0.171323 0.170173 0.169722 0.169580
BCL6_(+) ... ZNF576_(+) ZNF580_(+) ZNF587_(+) \
CD56dimCD16+ NK cells 0.554315 ... 0.699939 0.830660 0.331016
CD56brightCD16- NK cells 0.218650 ... 0.231419 0.245246 0.204168
NK cell progenitors 0.169995 ... 0.170470 0.171334 0.170350
ZNF600_(+) ZNF655_(+) ZNF669_(+) ZNF680_(+) \
CD56dimCD16+ NK cells 0.573349 0.376303 0.732088 0.547959
CD56brightCD16- NK cells 0.218755 0.220135 0.239966 0.225025
NK cell progenitors 0.168670 0.172754 0.170127 0.169853
ZNF708_(+) ZNF770_(+) ZSCAN26_(+)
CD56dimCD16+ NK cells 0.303017 0.323380 0.416029
CD56brightCD16- NK cells 0.205915 0.215546 0.297538
NK cell progenitors 0.171675 0.169546 0.168335
[3 rows x 219 columns]
mean StDev Ratio CD56dimCD16+ NK cells 0.682 0.175 0.938 CD56brightCD16- NK cells 0.232 0.014 0.060 NK cell progenitors 0.170 0.001 0.001
The specificity scores are quite different among the celltypes, CD56dim NK cells have the highest scores. The AUC looks better, especially for the NK CD56dim population
The specificity for the cell types looks odd, all the TF have a narrow range of specificity for each cell types, the scores resemble the different cell types ratios. Probably the RSS is biased by the big differences of proportions.
ARID3A_(+) ARNTL_(+) ATF1_(+) ATF2_(+) ATF4_(+) ATF6_(+) \
NK exhausted 0.463639 0.458321 0.497775 0.465179 0.503545 0.377034
Others 0.276448 0.270407 0.280775 0.275992 0.276288 0.253112
NK resident 0.430142 0.455180 0.484617 0.452651 0.495673 0.426380
BACH1_(+) BACH2_(+) BCL11B_(+) BCL6_(+) ... ZNF576_(+) \
NK exhausted 0.498114 0.434075 0.513515 0.420506 ... 0.466683
Others 0.278188 0.271988 0.279047 0.265811 ... 0.280341
NK resident 0.497253 0.494404 0.480839 0.383245 ... 0.453640
ZNF580_(+) ZNF587_(+) ZNF600_(+) ZNF655_(+) ZNF669_(+) \
NK exhausted 0.506104 0.281226 0.401912 0.314387 0.479456
Others 0.281104 0.223671 0.265765 0.240490 0.281403
NK resident 0.495097 0.292813 0.421775 0.315120 0.464933
ZNF680_(+) ZNF708_(+) ZNF770_(+) ZSCAN26_(+)
NK exhausted 0.406792 0.280230 0.280744 0.344619
Others 0.273959 0.235954 0.237725 0.242763
NK resident 0.388548 0.247604 0.277210 0.360779
[3 rows x 219 columns]
mean StDev Ratio NK exhausted 0.449 0.075 0.464 Others 0.269 0.018 0.105 NK resident 0.439 0.071 0.431
All the specificity scores reach a max value ~ 0.55 and their activity doesn't cluster the cells in the heatmap. Also, the AUC look quite low. The low values could also derive from the high number of batches in the dataset. Anyway some TFs are interesting (like STAT).
| ARID3A_(+) | ARNTL_(+) | ATF1_(+) | ATF2_(+) | ATF4_(+) | ATF6_(+) | BACH1_(+) | BACH2_(+) | BCL11B_(+) | BCL6_(+) | ... | ZNF576_(+) | ZNF580_(+) | ZNF587_(+) | ZNF600_(+) | ZNF655_(+) | ZNF669_(+) | ZNF680_(+) | ZNF708_(+) | ZNF770_(+) | ZSCAN26_(+) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MGUS_CD138nCD45p_2 | 0.170896 | 0.169268 | 0.170189 | 0.170365 | 0.170055 | 0.170218 | 0.169829 | 0.170273 | 0.169811 | 0.171568 | ... | 0.170911 | 0.170567 | 0.170461 | 0.170509 | 0.169218 | 0.170486 | 0.175699 | 0.168928 | 0.167549 | 0.170142 |
| MGUS_CD138nCD45p_3 | 0.204317 | 0.192854 | 0.198196 | 0.194469 | 0.194977 | 0.193858 | 0.194935 | 0.188656 | 0.192533 | 0.184365 | ... | 0.198002 | 0.197983 | 0.189419 | 0.194885 | 0.192884 | 0.198591 | 0.235459 | 0.173732 | 0.182398 | 0.187878 |
| MGUS_CD138nCD45p_4 | 0.177391 | 0.173407 | 0.176083 | 0.175713 | 0.175022 | 0.174343 | 0.174629 | 0.174702 | 0.174425 | 0.171125 | ... | 0.176208 | 0.175900 | 0.173372 | 0.175044 | 0.175155 | 0.176817 | 0.185459 | 0.173322 | 0.169644 | 0.175739 |
| MGUS_CD138nCD45p_5 | 0.176882 | 0.173296 | 0.175030 | 0.174044 | 0.174607 | 0.175083 | 0.174073 | 0.173334 | 0.173842 | 0.172365 | ... | 0.175379 | 0.175268 | 0.175146 | 0.173380 | 0.175684 | 0.175124 | 0.185886 | 0.169641 | 0.170601 | 0.173757 |
| MGUS_CD138n_1 | 0.192502 | 0.185239 | 0.188397 | 0.189900 | 0.187138 | 0.189471 | 0.185122 | 0.184466 | 0.185889 | 0.182334 | ... | 0.188824 | 0.189817 | 0.180835 | 0.181638 | 0.182640 | 0.192862 | 0.213541 | 0.176922 | 0.182610 | 0.183272 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| SMM_CD138nCD45p_9 | 0.181477 | 0.177165 | 0.180174 | 0.181480 | 0.178789 | 0.178514 | 0.177638 | 0.177662 | 0.178149 | 0.179777 | ... | 0.182603 | 0.180533 | 0.179998 | 0.177006 | 0.175865 | 0.183166 | 0.198056 | 0.178603 | 0.176606 | 0.182598 |
| SMM_CD138n_3 | 0.171903 | 0.169728 | 0.170921 | 0.171797 | 0.170510 | 0.169777 | 0.170556 | 0.170774 | 0.170349 | 0.170992 | ... | 0.171229 | 0.170837 | 0.170803 | 0.170697 | 0.169331 | 0.171389 | 0.173764 | 0.169179 | 0.170918 | 0.169807 |
| SMM_CD138n_4 | 0.203920 | 0.188879 | 0.200221 | 0.199888 | 0.196458 | 0.190234 | 0.193584 | 0.195191 | 0.195148 | 0.193618 | ... | 0.201070 | 0.198847 | 0.186835 | 0.194454 | 0.192864 | 0.202913 | 0.238553 | 0.182198 | 0.184110 | 0.188803 |
| SMM_CD138n_5 | 0.172469 | 0.170303 | 0.172419 | 0.173040 | 0.171650 | 0.170579 | 0.171879 | 0.172690 | 0.171502 | 0.172554 | ... | 0.173245 | 0.172636 | 0.170128 | 0.171631 | 0.171312 | 0.173533 | 0.179210 | 0.168251 | 0.170935 | 0.172364 |
| SMM_CD138n_6 | 0.173069 | 0.172687 | 0.172206 | 0.172443 | 0.171678 | 0.171593 | 0.171539 | 0.171179 | 0.171610 | 0.172385 | ... | 0.172766 | 0.172447 | 0.169355 | 0.171222 | 0.170413 | 0.172914 | 0.178244 | 0.172118 | 0.172970 | 0.171701 |
76 rows × 219 columns
| mean | StDev | Ratio | |
|---|---|---|---|
| MGUS_CD138nCD45p_2 | 0.170 | 0.001 | 0.001 |
| MGUS_CD138nCD45p_3 | 0.195 | 0.007 | 0.018 |
| MGUS_CD138nCD45p_4 | 0.175 | 0.002 | 0.004 |
| MGUS_CD138nCD45p_5 | 0.175 | 0.002 | 0.004 |
| MGUS_CD138n_1 | 0.187 | 0.005 | 0.013 |
| ... | ... | ... | ... |
| SMM_CD138nCD45p_9 | 0.180 | 0.004 | 0.006 |
| SMM_CD138n_3 | 0.171 | 0.001 | 0.001 |
| SMM_CD138n_4 | 0.197 | 0.008 | 0.019 |
| SMM_CD138n_5 | 0.172 | 0.001 | 0.002 |
| SMM_CD138n_6 | 0.172 | 0.001 | 0.002 |
76 rows × 3 columns
| mean | StDev | Ratio | |
|---|---|---|---|
| count | 76.000000 | 76.000000 | 76.000000 |
| mean | 0.184039 | 0.004618 | 0.013158 |
| std | 0.021651 | 0.006751 | 0.023457 |
| min | 0.168000 | 0.000000 | 0.000000 |
| 25% | 0.171750 | 0.001000 | 0.002000 |
| 50% | 0.176000 | 0.002000 | 0.005500 |
| 75% | 0.187500 | 0.006000 | 0.013250 |
| max | 0.285000 | 0.040000 | 0.145000 |